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ABSTRACT 

The term **visual literacy** generally refers to the 
interpretation of the formal structure of film or television and 
carries with it the notion that the interpreter has knowledge of the 
use of came^ra angles, lighting, flashbacks, and so forth. However, 
many visual conventions encountered in movies or television may be 
interpreted even by a **naive** viewer with no previous experience in 
media conventions with the use of general cognitive skills. For 
example, K'hen seeing a character filmed from a low camera angle, even 
naive viewers can understand that the character is meant to appear 
powerful because viewers ere accustomed to looking up to powerful 
people. Similarly, viewers* cognitive skills let them interpret 
subjective and objective shots as the camera switches from a 
character's point of view to a view of the scene itself, Fina.lly, 
viewers are sensitive to contextual cues in nonverbal communication, 
so that when shots of a character's face are intercut with shots of 
an object of interest or a listener* s face, viewers can perceive 
nuances of meaning by the juxtaposed images. Both ** reality** and 
exposure to film or television are potential avenues to the 
interpretation of visual cues, and both sources of interpretational 
competence conceivably work together, interacting in a complex 
fashion to help viewers understand what they see, (Twelve references 
are included,) (JC) 
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THE R0L2 OP VISUAL 'LITERACY' IN FILM COMMUNICATION 

I am using the term "visual literacy" to refer to the notion 
that the interpretation of film or television — ; more precisely, 
of the formal structure of a movie or TV program — depends on 
prior familiarity with a set of formal coraventions (i.e., the 
conventional uses and meanings of such things as close-ups, 
point-of-view shots, slow motion, flashbacks, etc.). I think 
it'd fair to say that this notion is accepted almost 
axiomatically by most people who have written or thought aOsout 
the issue (cf. Carey, 1982; Worth, 1981). Although such people 
are often quite sensitive to the dangers of overstating possible 
analogies between visual media and language, the specific analogy 
implied by the notion of visual "literacy" — the idea that 
competent visual interpretation presupposes a type of learning 
which is comparable to the learning of a language — is typically 
not treated as a controversial assumption. Nonetheless, this 
paper is based on the premise that this standard view of visual 
interpretation probably does overstate the similarity between 
visual media and language in this area. Specifically, I shall 
argue that many visual conventions routinely encountered in 
movies or TV programs may be interpretable even by a "naive," 
firs ^ime vieimr with no previous experience of these media. 

This point can be argued on both empirical and theoretical 
grounds. My aim in this paper is exclusively theoretical, but it 
is worth mentioning that the available empirical evidence 
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Indicates that at least two of the most crucial production 
variables ~ changes In camera -to-subject distance; changes in 
point of view — do not pose any significant interpretational 
problems to inexperienced viewers (Messaris, 1982) . In other 
words, there is evidence that, for some visual conventions at 
least, general cognitive skills — skills whose application must 
extend beyond the range of film and television and which a viewer 
could be expected to have even before his/her first encounter 
with these media — may serve as a basis for interpretation. My 
primary purpose in this paper is to spell out what some of these 
cognitive skills might be. Specifically, I want to discuss the 
following 'three types, which I consider to be central components 
of a competent viewer's repertory: (a) analogical thinking — - 
the ability to perceive a formal analogy between a visual device 
and some aspect of everyday experience; (b) spatial intelligence 
— the ability to derive a coherent sense of a three-dimensional 
scene out of a limited number of partial views of that scene; 
(c). sensitivity to contextual information in the interpretation 
of nonverbal behavior, in discussing these cognitive skills, i 
shall be developing a theoretical account of visual 
interpretation which differs from the more traditional, 
"visual-literacy" approach. This is not to say that the two 
approaches are totally incompatible. In fact, I shall describe 
ways in which they might complement each other. However, to the 
extent that the approach I am suggesting offers a valid 
interpretation of the viewing process, one of the main 
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implications of the "visual-literacy" approach ~ the idea that 
comprehension of film and television requires extensive previous 
exposure — will have to be revised. 

(a) Ana logical Thinking. One of the clearest examples of a 
formal convention which is encountered very widely in both film 
and television is that of the use of camera angle as a means of 
making someone look powerful or powerless. In other words, what 
I'm referring to is the well-know principle of using a low camera 
angle ~ shooting from below — to make the person in the shot 
appear more powerful (or menacing, threatening, etc., depending 
on the exact context) ; and, conversely, Rising a high camera 
angle — shooting from above — to make the person in the shot 
appear weaker, etc. I would assume that even someone with no 
formal background in film or TV scholarship would readily 
recognize this convention, since it is, as indicated, in very 
wide use. Now, the question that I am interested in is this: 
How does the viewer come to understand this convention when he or 
she sees it in a film or TV program? What previous knowledge or 
experience must the viewer have in order to be able to respond to 
this use of camera angle in the appropriate manner (and, when I 
say appropriate, I mean: as called for by the convention)? 

The standard response to this question would be that the 
viewer would have to have had a number of previous encounters 
with this use of camera angle, in the course of which he or she 
would gradually — or maybe not sc gradually — have acquired a 
sense of this device's meaning. For example, we may imagine a 
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child gradually coming to associate low angles with shots of 
villains in threatening postures and thereafter responding to the 
angle in and of itself in the appropriate way. This certainly 
seems like a plausible possibility, and, in fact, I'm sure that 
it does indeed happen to a certain extent. 

At the same time, however, it seems to me that there is an 
alternative route to the interpretation of this kind of use of 
camera angle — and, if I am correct, this alternative route 
would not require any previous exposure to this specific device. 
This alternative route is based on the fact — or, what I would 
argue is the fact — that the camera-angle convention is not an 
arbitrary convention (in other words, it's not like the word 
"powerful," whose form is unrelated to the concept it denotes) r- 
instead, I would argue that the particular use of camera angle 
which we've been examining derives its meaning by analogy with 
real-life situations of looking up at powerful people or looking 
down at weak people (cf. Schwartz, 1981) ~ a realm of 
experience that is likely to be particularly relevant to the 
formative years of childhood. If this assumption is correct, its 
implication is that a viewer should be able to respond 
appropriately to camera angle on the basis of the analogy with 
real-life experience, without any necessary previous exposure to 
the use of camera angle in film or television. In other words, 
here we have an example of a formal device which may be 
interpretable on the basis of a general cognitive skill, namely, 
sensitivity to visual analogy ~ or, perhaps, even more 



6 



visual "Literacy" p^^^^ ^ 

generally, aptitude in analogical thinking. This alternative 
possibility does not preclude the more standard approach, and, in 
fact, it seems quite possible that general sensitivity to visual 
analogy nay develop from the specific experience of camera angle 
and other similar conventions. Nevertheless, as interpretational 
mechanisms, these two alternatives are certainly distinct. 

While camera angle may be one of the clearest examples of a 
device which draws on analogy for its meaning, in my view it is 
certainly not the only one. Indeed, I would argue that the use 
of analogical constructions is one of the distinctive features of 
film and television as modes of communication. Let me list, very 
briefly, some other formal conventions whose meaning appears to 
derive from analogy with some aspect of real-life experience, ht 
a minimum, such a list would include the following: 
camera-to-subject distance (i.e., the use of close-up vs. medium 
shot, etc.) as a means of emphasis or ws a means of generating 
intimacy and identification with a character on the screen (as 
Meyrowitz [1986] has argued in an extended analysis of this 
variable, it appears to derive its meaning and its effectiveness 
from an analogy to the real-life area of proxemic behavior) ; the 
use of camera movement to simulate a character's subjective 
visual experience in point-of-view shots; framing a shot so as 
to magnify the size of important characters or objects; rapid 
cutting as a method of increasing the impact of action sequences; 
"Eisensteinian" editing, in which objects are juxtaposed on the 
basis of conceptual analogy (e.g., the famous example, from 
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Elsen8tein*s StriXe , of aassacred strikers juxtaposed with 
anlnals in a slaughterhouse) . 

(b) Spatial Intelligence. As the above list indicates, the 
scope for analogical thinking in the interpretation of film and 
television appears to be quite extensive. However, in my view, 
the cognitive skill which is of greatest importance to film or TV 
interpretation is probably not analogical thinking but, rather, 
spatial intelligence. As concaived of by cognitive psychologists 
(e.g., see Gardner, 1983), spatial intelligence comprises a 
cluster of related cognitive abilities, of which the most 
crucial, for our purposes, is the ability to derive a coherei^t 
sense of a three-dimensional scene out of a limited number of 
partial views of that scene. Anyone familiar with cognitive 
psychology will recognize here an area of intelligence which is 
typically tapped through such measures as Piaget's three-mountain 
task: A child is shown a certain view of a mountainous landscape 
and asked to indicate how the mountains would appear from a 
different viewing position. Although I do not think that this 
specific situation has an exact parallel in film or TV 
interpretation, the general skill of spatial integration on the 
basis of partial views is brought into play every time the action 
in a scene is "interrupted" by a cut from om; point of view to 
another. Of course, such transitions need not be extreme. 
Often, all that is involved is a small reorientation of the 
camera back and forth bet%reen two people having a conversation. 
On the other hand, when it comes to action sequences, or such 
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things as a switch from an "objective" view to a "siibjective" 
shot (i.e., the point of view of a character in the scene itself) 
the change in point of view can be quite radical — and, 
presumably, quite demanding with regard to the viewer's spatial 
intelligence. 

As I have indicated, I think that spatial intelligence may be 
the most important component of a competent viewer's 
intepretational abilities. This judgment is based on an analysis 
of the kinds of editing which a viewer is likely to encounter in 
typical fictional TV programs. He examined a convenience sample 
of nine TV procjrams: three daytime soap operas, three sit-coms, 
and three police/adventure shows. Our analysis was concerned 
with the editing. He looked at each shot transition (cut, 
fade-out/ fade-in, dissolve, etc.) and classified them into five 
overall categories, of which the only one which is "relevant for 
our purposes was the first: a transition within a single 
location / from one point of view to another . Overall, an average 
of .ninety- five percent of the transitions fell into this 
category. (Average N for total transitions « 559 for soap 
operas; 250 for sit-coms; 398 for detective shows.) In other 
words, by an overwhelming majority, the kind of editing 
transition which a viewer is likely to bo confronted with in a 
typical fictional TV program is precisely the kind of transition 
for which spatial intelligence is the relevant interpretational 
skill. All the other editing devices — time/space changes, 
flashbacks, etc, — (which sometimes seem to get the lion's 
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.hars Of attention fro» scholars) are in fact a tiny minority of 
tho whole e 

The relevance of spatial intelligence to fii« or TV viewing 
has received considerable attention from cognitive psychologists, 
and there are several studies suggesting or demonstrating a link 
between TV experience and performance on' Piagetian or other tests 
of this cognitive skill (e.g., Salomon, 1979; Tidhar, 1984; 
Wacht«l, 1984). A review of this research is beyond the scope of 
this presentation, but the general finding - namely, that TV 
viewing can influence spatial intelligence ~ suggests a 
two-sided conclusion to what has been said so far: On the one 
hand, the major thrust of this presentation has been to argue 
that. When certain cognitive skills precede film or TV 
experience, they can provide an avenue to interpretation in the 
absence of specific familiarity with the formal conventions of 
these media. On the other hand, to the extent that spatial 
intelligence — and perhaps other cognitive skills ~ are 
developed further through the viewing experience itself, we can 
say that competence in film or TV interpretation is actually a 
form of more general intelligence. 

(c) Sensitivity to Contextu al Cues in Nonverbal 
communication. The creation of a coherent space/time continuum 
out Of the fragments presented in a movie or TV program is one of 
the central intellectual tasks which visual media demand of their 
viewers. However, t-he point of editing is not always that of 
linking time frames and points of view, a second major purpose 



10 



visual "Literacy" p^^^^ ^ 

— ••pacially in dramatic contexts ~ is that of revealing 
characters' thoughts, intentions, and personal itiee. This 
possibility was one of the earliest discoveries in the history of 
explicit theorizing about the movies. Its formulation is usually 
associated with Lev Kuleshov and other filmmakers working during 
the early years of Soviet cinema. In its best-known incarnation, 
the so-called "Kuleshov effect" is illustrated in Kuleshov 's 
experiment involving an "expressionless" close-up of the Russian 
actor MozhxJkhin juxtaposed with a variety of other scenes, 
including a plate of soup on a table, a corpse in a coffin, and a 
little girl playing with a toy bear. According to Kuleshov •» 
colleague V,I. pudovkin, to whom we owe the description of this 
experiment, viewers who saw these sequences witliout having been"- 
told about the editing responded with enthusiastic praise for 
Mozhukhin's acting. In other vords, the editing led these 
viewers to see subtle changes in expression — from thoughts of 
food to deep sorrow to a "light, happy smile" — where in fact 
there were nonft (Pudovkin, 1976, p. 168) . 

The general category of juxtapositions explored in this 
experiment (and others which followed it) is a firmly established 
feature of film and TV editing, occurring most commonly, perhaps, 
in tlie conventional "reaction-shot" sequence, in which shots of a 
speaker or other object of interest are intercut with shots of a 
listener or observer. Such sequences are a typical ingredient of 
dialogue scenes in fiction films, as well as of "non-fictional" 
dialogues in talk shows and other TV programs, but the potential 
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rol« of imaga juxtaposition as an indicator of characters' 
thoughts or reactions is probably nost evident in the absence of 
fiialogue and in those "non-fictional" cases in which a certain 
sequence of events in rearranged through editing (as in the many 
instances in which an interviewer's "reactions" are inserted into 
a TV interview after the fact) . Assuming, as the evidence 
suggests, that viewers typically do use the juxtaposition of 
inages — rather than just the facial expressions in them — as 
clues to what lies "beneath the surface" of characters' faces, we 
are confronted with another cluster of visual conventions based 
on a single general principle. What might account for viewers' 
ability to make sense of these conventions? 

one posGibility, as always, is that entailed in the notion of 
visual "litoracy," namely, previous experience with the 
conventions in question. On the other hand, this is an area in 
which a ready parallel with a set of "real-life" cognitive akiUs 
suggests itself. Although the precise visual, se^^nce which the 
viewer is confronted with on the screen — a view of a character 
juxtaposed with a view of some object or situation of interest to 
that character — may not have an exact parallel in reality, the 
basic inferential task which the viewer has to perform in the 
case of the film or TV sequence is similar to an extremely common 
real-life task, namely, that of judging other people's intentions 
from the context of their behavior. The degree to which tnis 
process is central to interpersonal communication bears some 
emphasis. As researchers in the areas of nonverbal communication 
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and of person perception have noted, people's appearance, 
expressions, and actions are frequently ambiguous, or even 
completely opaque, in the absence of information about the 
objects or situations to which they are addressed, indeed, 
Birdwhistell (1970) has argued that no facial expression or 
gesture has a determinate meaning out of context. The ability to 
take context into account in inferring thoughts and assessing 
intentions is consequently a vital component of any mature 
person's soci«»l skills. It is conceivable, therefore, that^this 
ability — rather than any direct experience with editing 
conventions — may serve as the basis of the interpretational 
competence called for by the kinds of editing we are concerned 
with here. 

This notion ~ that the "Kuleshov effect" and related 
cinematic devices are derivatives of the real-life dependence of 
meaning on context ~ is consistent with the implications of 
another, less widely known, experiment by the Russian filmmaker. 
In this experiment, Kuleshov filmed an actor in two roles: 
first, in a jail cell, as a famished prisoner being offered a 
bowl of soup; second, as a prisoner released from jail and taken 
out into the open air. The actor was invited to use every means 
at his disposal to express the sentiments appropriate to these 
two situations: on the one hand, craving for the soup; on the 
other, delight at the sight of birds, clouds, the sun. Then 
Kuleshov produced various versions of the two scenes, in some of 
which the shots of the actor were transposed from one scene to 
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the other. By his own account, regardless of how the scenes were 
scrambled, viewers were unable to detect any discrepancy in the 
actor's performance (Kuleshov, 1974, p. 54). in other words, 
despite the fact that the actor had a clear and distinct 
sentiment in aslnd in each case, his facial expressions in 
themselves were apparently Incapable of conveying a specific 
enough sense of his thoughts, and the viewers' ultimate 
interpretations were evidently fixed by the overall context. 
This is essentially the point which Investigators of "real-life" 
social perception have made about the information available in 
facial expressions and other overt indicators of thought and 
intention. Unlike the more famous Kuleshov experiment mentioned 
earlier, whose use of an unvarying, "neutral" expression might be 
seen as somewhat artificial, this one is based on a closer 
approximation of "real-life" conditions ~ in the sense that the 
actor's performance was allowed to vary with the situation ~ and 
it therefore makes the potential relationship between "real life" 
and this aspect- of movie viewing clearer. 

As this discussion suggests, then, both "reality" and 
exposure to fil'r or television are potential avenues to the 
interpretation of the kinds of visual devices we have been 
considering. One possibility does not necessarily exclude the 
other, of course, it is conceivable that these two sources of 
Interpret at ional competence might work together, either by 
reinforcing each other or by Interacting in a more complex 
fashion. For example, previous exposure to editing might teach a 
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viewer which juxtapositions of images to look at for 
psychological implications, while "real-life" social experience 
might guide the actual inferences drawn from those 
juxtapositions. In the absence of research aimed specifically at 
disentangling these possibilities ~ for example, a study of 

"naive" adult viewers' susceptibility to the "Kuleshov effect" 

it is unclear that one can be more specific about either the 
necessary preconditions of these aspects of visual interpretation 
or the typical mix of experiences leading to them. However, what 
we can say is that, to the extent that the connections which have 
been drawn here between general (real-life) cognitive skills and 
visual conventions are valid, the interpretation of these 
conventions should be at least partially accessible even to an - 
inexperienced, first-time viewer. 
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